Foundations of Machine Learning Frameworks
CSCN8010 - Winter 2024
Professor: Ran Feldesh
Student: Arcadio de Paula Fernandez
![]()
To create the graph below, we will use the plotting library for Python called Matplotlib. As data, we will use the classic Titanic database, containing the number of passengers, age, sex, survivors, etc.
For more information about the Titanic you can access the following link.
The graph is a histogram showing the distribution of the number of passengers and their age.
# Importing several libraries for data visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Loading the dataset in seaborn data repository of Titanic
df = sns.load_dataset('titanic')
df.head()
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
# The 'age' column was selected and missing values were dropped by using .dropna()
ages = df['age'].dropna()
#The number of bins were set and to create the histogram
n_bins = 30
# Creating the histogram plot
plt.hist(ages, bins=n_bins, edgecolor="white")
# Setting labels and title
plt.xlabel('Age')
plt.ylabel('Number of Passengers')
plt.title('Histogram of Passenger Ages')
# Showing the plot
plt.show()
![]()
The graph below is another histogram showing the distribution of the number of passengers and their age, but now in Seaborn, also a Python data visualization library.
import seaborn as sns
# Loading the dataset in seaborn data repository of Titanic
df = sns.load_dataset('titanic')
df.head()
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='age', kde=True, hue='sex')
plt.title('Age Distribution by Gender')
plt.show()
![]()
The graphs below show the number of passengers that survived and died but are now in Plotly Express, also a Python data visualization library.
# Loading the dataset in seaborn data repository of Titanic and saving it in the
titanic_data = sns.load_dataset('titanic')
# Viewing the first 5 rows
titanic_data.head()
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
pip install -U kaleido
Requirement already satisfied: kaleido in c:\users\arcad\appdata\local\programs\python\python311\lib\site-packages (0.2.1) Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.2.1 -> 23.3.2 [notice] To update, run: python.exe -m pip install --upgrade pip
import plotly.express as px
import plotly.offline as pyo
pyo.init_notebook_mode()
fig_pie = px.pie(titanic_data, names='survived', title='Passenger Survival',color_discrete_map={'Not Survived': 'red', 'Survived': 'green'},labels={'SurvivalLabel': 'Survival'})
fig_pie.show()
fig_pie.write_image('pie_chart.png')

fig = px.scatter(titanic_data, x='fare', y='age', color='survived', size='fare')
fig.show()
fig.write_image('scatter.png')
!jupyter nbconvert --to html "C:\Users\arcad\CSCN8010-labs\Lab2-Arca\Class 2_Lab_Arcadio_v3.ipynb" --output-dir ./docs/